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Abstract 

In this paper, we consider the problem of joint scheduHng and resource allocation in the OFDMA 
downlink, with the goal of maximizing an expected long-term goodput-based utility subject to an instan- 
taneous sum-power constraint, and where the feedback to the base station consists only of ACK/NAKs 
from recently scheduled users. We first establish that the optimal solution is a partially observable 
Markov decision process (POMDP), which is impractical to implement. In response, we propose a greedy 
approach to joint scheduling and resource allocation that maintains a posterior channel distribution 
for every user, and has only polynomial complexity. For frequency-selective channels with Markov 
time-variation, we then outline a recursive method to update the channel posteriors, based on the 
ACK/NAK feedback, that is made computationally efficient through the use of particle filtering. To 
gauge the performance of our greedy approach relative to that of the optimal POMDP, we derive a 
POMDP performance upper-bound. Numerical experiments show that, for slowly fading channels, the 
performance of our greedy scheme is relatively close to the upper bound, and much better than fixed- 
power random user scheduling (FP-RUS), despite its relatively low complexity. 

Keywords: OFDMA downlink, scheduling and resource allocation, ACK/NAK feedback, particle 
filters. 
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I. Introduction 

In the downlink of a wireless orthogonal frequency division multiple access (OFDMA) system, 
the base station (BS) must deliver data to a set of users whose channels may vary in both time 
and frequency. Since bandwidth and power resources are limited, data delivery must be carried 
out efficiently, e.g., by pairing users with strong subchannels and by distributing power across 
users in the most effective manner. Often, the BS must also adhere to per-user quality-of-service 
(QoS) constraints. Overall, the BS faces the challenging problem of jointly scheduling users 
across subchannels, optimizing their modulation-and-coding schemes, and allocating a limited 
power resource to maximize some function of per-user throughputs. 

The OFDMA scheduling-and-resource-allocation problem has been addressed in a number of 
studies that assume the availability of perfect channel state information (CSI) at the BS (e.g., 
[l]-[7]). In practice, however, it is difficult for the BS to maintain perfect CSI (for all users 
and all subchannels), since CSI is most easily obtained at the user terminals, and the bandwidth 
available for feedback of CSI to the BS is scarce. Hence, practical resource allocation schemes 
use some form of limited feedback [8], such as quantized channel gains. 

In this work, we consider the exclusive use of ACK/NAK feedback, as provided by the 
automatic repeat request (ARQ) [9] mechanism present in most wireless downlinks. We assume 
standard ARq|i] where every scheduled user provides the BS with either an acknowledgment 
(ACK), if the most recent data packet has been correctly decoded, or a negative acknowledgment 
(NAK), if not. Although ACK/NAKs do not provide direct information about the state of the 
channel, they do provide relative information about channel quality that can be used for the 
purpose of transmitter adaptation (e.g., [10], [11]). For example, if an NAK was received for a 
particular packet, then it is likely that the subchannel's signal-to-noise ratio (SNR) was below 
that required to support the transmission rate used for that packet. We consider the exclusive use 
ACK/NAK feedback provided by the link layer, because this allows us to completely avoid any 
additional feedback, such as feedback about quantized channel gains. 

There are interesting implications to the use of (quantized) error-rate feedback (like ACK/NAK) 
for transmitter adaptation, as opposed to quantized channel-state feedback. With error-rate feed- 

' The approach we develop in this paper could be easily extended to other forms of link-layer feedback, e.g., Type-I and 
Type-II Hybrid ARQ. For simplicity and ease of exposition, however, we consider only standard ARQ. 
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back, the transmission parameters applied at a given time-slot affect not only the throughput 
for that slot, but also the corresponding feedback, which will impact the quality of future 
transmitter-CSI, and thus future throughput. For example, if the transmission parameters are 
chosen to maximize only the instantaneous throughput, e.g., by scheduling those users that the 
BS believes are currently best, then little will be learned about the changing states of other user 
channels, implying that future scheduling decisions will be compromised. On the other hand, if 
the BS schedules not-recently-scheduled users solely for the purpose of probing their channels, 
then instantaneous throughput will be compromised. Thus, when using error-rate feedback, the 
BS must navigate the classic tradeoff between exploitation and exploration [12]. 

In this work, we propose a scheme whereby the BS uses ACK/NAK feedback to maintain 
a posterior channel distribution for every user and, from these distributions, performs simul- 
taneous user subchannel-scheduling, power-allocation, and rate-selection. In doing so, the BS 
aims to maximize an expected, long-term, generic utility criterion that is a function of the per- 
user/channel/rate goodputs. Our use of a generic utility-based criterion allows us to handle, e.g., 
sum-capacity maximization, throughput maximization under practical modulation- and-coding 
schemes, and throughput-based pricing (e.g., [13]-[15]), as discussed in the sequel. To this 
end, we exploit our recent work [16], which offers an efficient near-optimal scheme for utility- 
based OFDMA resource allocation under distributional CSI. Our use of ACK/NAK-feedback, 
however, makes our problem considerably more complicated than the one considered in [16]. 
For example, as we show in the sequel, the optimal solution to our expected long-term utility- 
maximization problem is a partially observable Markov decision process (POMDP) that would 
involve the solution of many mixed-integer optimization problems during each time-slot. Due 
to the impracticality of the POMDP solution, we instead consider (suboptimal) greedy utility- 
maximization schemes. As justification for this approach, we first establish that the optimal utility 
maximization strategy would itself be greedy if the BS had perfect CSI for all user-subchannel 
combinations. Moreover, we establish that the performance of this perfect-CSI (greedy) scheme 
upper-bounds the optimal ACK/NAK-feedback-based (POMDP) scheme. We then propose a 
novel, greedy utility-maximization scheme whose performance is shown (via the upper bound) 
to be close to optimal. Finally, due to the computational demands of tracking the posterior 
channel distribution for every user, we propose a low-complexity implementation based on 
particle filtering. 
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We now describe the relation of our work to the existing literature [17]-[19]. In [17], a 
learning-automata-based user/rate scheduling algorithm was proposed to maximize system through- 
put based on ACK/NAK feedback while satisfying per-user throughput constraints. While [17] 
considered a single channel, we consider joint user/rate scheduling and power allocation in a 
multi-channel OFDMA setting. In [18], a state-space-based approach was taken to jointly sched- 
ule users/rates and allocate powers in downlink OFDMA systems under slow-fading channels 
in the presence of ACK/NAK feedback and imperfect subchannel-gain estimates at the BS. In 
particular, assuming a discrete channel model, goodput maximization was considered under a 
target maximum packet-error probability constraint and a sum-power constraint across all time- 
slots. Its solution led to a POMDP which was solved using a dynamic-program. While the 
approach in [18] is applicable to only goodput maximization under discrete-state channels, ours 
is applicable to generic utility maximization problems under continuous-state channels. Further- 
more, our approach is based on particle filtering and lends itself to practical implementation. 
In [19], the user/rate scheduling and power allocation problem in OFDMA systems with quasi- 
static channels and ACK/NAK feedback was formulated as a Markov Decision Process and an 
efficient algorithm was proposed to maximize achievable sum-rate while maintaining a target 
packet-error-rate and a sum-power constraint over a finite time-horizon. Apart from assuming 
a discrete-state quasi-static channel model, the scope of this work was limited by two other 
assumptions: i) in each time-slot, the BS scheduled only one user across all subchannels for data 
transmission, and ii) all users decoded the broadcasted data-packet and sent ACK/NAK feedback 
to the BS. In contrast, we consider the scenario where multi-user diversity is efficiently exploited 
by scheduling different users across different subchannels, and only the scheduled users report 
ACK/NAK feedback. Furthermore, we consider general utility maximization under continuous- 
state time-varying channels, and propose a polynomial-complexity joint scheduling and resource 
allocation scheme with provable performance guarantees. 

The rest of the paper is organized as follows. In Section |IIl we outline the system model and, 
in Section [nil we investigate the optimal scheduling and resource allocation scheme. Due to the 
implementation complexity of the optimal scheme, we propose a suboptimal greedy scheme in 
Section |IV] that maintains posterior channel distributions inferred from the received ACK/NAK 
feedback. In Section |Vl we show how these posteriors can be recursively updated via particle 
filtering. Numerical results are presented in Section|Vll and conclusions are stated in Section rvill 
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II. System Model 

We consider a packetized downlink OFDMA system with a pool of K users. During each time 
slot, the BS (i.e., "controller") transmits packets of data, composed of codewords from a generic 
signaling scheme, through N OFDMA subchannels (with N ^ K). Each packet propagates 
through a fading channel on the way to its intended mobile user, where the fading channel is 
assumed to be time-invariant over the packet duration, but is allowed to vary across packets in 
a Markovian manner. Henceforth, we will use "time" when referring to the packet index. At 
each time-instant, the BS must decide — for each subchannel — which user to schedule, which 
modulation-and-coding scheme (MCS) to use, and how much power to allocate. 

We assume M choices of MCS, where the MCS index m G {1, . . . , M} corresponds to a 
transmission rate of bits per packet and a packet error rate of the form e = a^e"'''"^'^ under 
transmit power P and squared subchannel gain (SSG) 7, where and hm are constants [20]. 
Let (n, k, m) represent the combination of user k and MCS m over subchannel n. In the 
sequel, we use P^km^ llik^ ^'^d ^nitm to denote — respectively — the power allocated to, the 
SSG experienced by, and the error rate of the combination [n, k, m) at time t. Additionally, 
we denote the scheduling decision by /* ^ ,„ G {0, 1}, where /* = 1 indicates that user/rate 
{k,m) was scheduled on subchannel n at time t, whereas Inkm — ^ indicates otherwise. Since 
we assume that only one user/rate {k, m) can be scheduled on a given subchannel n at a given 
time t, we have the "subchannel resource" constraint Ylikm^nkm ^ ^ ^i'^- We also 

assume a "sum-power constraint" of the form Yl,n,k,m ^n,k,m Pn,k,m ^ ^con for all t. 

Our goal in scheduling and resource allocation is to maximize an expected long-term utility cri- 
terion that is a function of the per-user/rate/subchannel goodputs, i.e., E | ^ ^ ^ Un,k,mign k m) } 
Here, Qni^rn denotes the goodput contributed by user k with MCS m on subchannel n at time 
t, which can be expanded as 51^ ^. .^ = - eJi^fc,^)'^™- Meanwhile, t/„,fc,m(-) is a generic 

utility function that we assume (for technical reasons) is twice differentiable, strictly-increasing, 
and concave, with f/„,fc,m(0) < 00. We use f/„,fc,m(") to transform goodput into other metrics that 
are more meaningful from the perspective of quality-of- service (QoS), fairness [21], or pricing 
(e.g., [13]-[15]). For example, to maximize sum-goodput, one would simply use Un,k,m{x) = 
X. To enforce fairness across users, one could instead maximize weighted sum-goodput via 
Un,k,m{x) = WkX, where {wk} are appropriately chosen user-dependent weights. To maximize 
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sum capacity, i.e., En,fe log(l + Pn,k,i^n,k)^ one would choose M = ai = h = n = 1 
and Un^k,i{x) = log(l — log(l — x)) for x E [0,1). To incorporate user-fairness into capacity 
maximization, one could instead choose f/„,/t,i(-) = tffclog(l — log(l — x)), where again {wk} 
are appropriately chosen user-dependent weights [20]. 

For each time t, the BS performs scheduling and resource allocation based on posterior 
distributions on the SSGs {7^^} inferred from previously received ACK/NAK feedback. In 
the sequel, we write the ACK/NAK feedback about the packet transmitted to user k across 
subchannel n at time t by /* ^. G {1,0,0}, where 1 indicates an ACK, indicates a NAK, and 
covers the case that user k was not scheduled on subchannel n at time t. Thus, in the case 
of an infinite past horizon and a feedback delay of d ^ 1 packets, the BS would have access to 
the feedbacks {f^k Vn, fcj^^'Loo for time-t scheduling. 

in. Optimal Scheduling and Resource Allocation 

In this section, we describe the optimal solution to the problem of scheduling and resource 
allocation over the finite time-horizon t E {I, . . . , T}. For this purpose, some additional notation 
will be useful. To denote the collection of all time-t scheduling variables {Inkm}^ ^se /* E 
{0, 1}^^^^. To denote the collection of all time-t powers {P^^^,^}, we use P* E [0,oo)^^'^. 
To denote the collection of all time-t ACK/NAK feedbacks {/* J we use F* E {1,0,0}^^, 
and to denote the collection of all time-t user-A; feedbacks we use fl E {1, 0, 0}^. 

For time-t scheduling and resource allocation, the controller has access to the previous feed- 
back = {F~°°,...,F*"'^}, scheduling decisions = {/"°°, . . . , J*"'^}, and power 
allocations P*r^ = {P~°°, . . . , P*""*}. It then uses this knowledge to determine the schedule / 
and power allocation p* maximizing the expected utility of the current and remaining packets: 



(j*,opt^pt,opt) ^ argmax e( V /* , ^f/„,fc,„((l - a^e"^'"^-^-^-.^^) 

r=i+l ^ 

where the domain of /* is X = {/ G {0, 1}^^^/ . J2k,m^n,k,'m ^ 1 Vra}, the domain of P* is 
V ^ [0,00)™, and X ^ {(/,P) ElxV : En,k,mIn,k,mPnAm ^ ^con}- The expectation 
in ([T]) is jointly over the squared subchannel gains (SSGs) {'j^k '■ t = t, . . . ,T,yn,yk}. Using 
the abbreviations U^^^k,miln,k,m, Pn,k,m) = In,k,mUn,k,m{{l - a^e~*™^"''=''"^'Vfe)r^) and Ft"^ = 
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{F^_^, -F**-c^}' the optimal expected utility over the remaining packets {t, . . . , T} can be 
written (for t ^ 0) as 

tot t~d A f ^ 

^ T=t n,k,m 

For a unit-delaj^ system (i.e. d = 1), the following Bellman equation [22] specifies the corre- 
sponding finite-horizon dynamic program: 



¥'~^}. (2) 



UTi^'~^)= max 



I / y ^ n,k,m\ n,k,my n,k,mj — oo | 
n,fc,m 



(3) 



where the second expectation is over the feedbacks -F*. The solution obtained by solving ([T]) is 
typically referred to as a partially observable Markov decision process (POMDP) [12]. 

The definition of V implies that the controller has an uncountably infinite number of possible 
actions. Although this could be circumvented (at the expense of performance) by restricting the 
powers P^km to come from a finite set, the problem would remain very complex due to the 
continuous-state nature of the SSGs 7*^^,,. While these SSGs could then be quantized (causing 
additional performance loss), the problem would still remain computationally intensive, since 
POMDPs (even with finite states and actions) are PSPACE-complete, i.e., they require both 
complexity and memory that grow exponentially with the horizon T [23]. To see why, notice 
from ([3]) that the solution of the problem at every time t depends on the optimal solution at 
times up to t—1. Because both terms on the right side of ^ are dependent on (/*, P*), however, 
the solution of the problem at time t also depends on the solution of the problem at time t+1, 
which in turn depends on the solution of the problem at time t + 2, and so on. In conclusion, 
the optimal controller is not practical to implement, even under power/SSG quantization. 

Consequently, we will turn our attention to (sub-optimal) greedy strategies, i.e., those that do 
not consider the effect of current actions on future utilities. To better understand their performance 
relative to that of the optimal POMDP, we derive an upper bound on POMDP performance. 

^ For the d > 1 case, the Bellman equation is more complicated, and so we omit it for brevity. 
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A. The "Causal Global Genie" Upper Bound 

Our POMDP-performance upper-bound, which we will refer to as the "causal global genie" 
(CGG), is based on the presumption of perfect error-rate feedback of all previous user/subchannel 
combinations, i.e., {e^^ ^, Vn, k,T ^ t—d}. For comparison, the ACK/NAK feedback available to 
the POMDP is a form of degraded error-rate feedback on previously scheduled user/subchannel 
combinations. Since, given knowledge of ^, ^ and Pnkm ^'^Y ^^t^ index m, the SSG 7^ ^ can 
be obtained by simply inverting the error-rate expression ^nkm — 0™^"^'"^"'='™^"'=, our genie- 
aided bound is based, equivalently, on perfect feedback of all previous SSGs {'j^ j^yn, k,T ^ 
t — d}. In the sequel, we use 7* G [0, 00)^^^ to denote the collection of all time-t SSGs 
{7*^fc VA:, n}, and we define 7*."^ = {7"°°, . . . , V"'^}. 

We characterize the CGG as "global" since it uses feedback from all user/subchannel com- 
binations, not just the previously scheduled ones. Although a tighter bound might result if the 
(perfect) error-rate feedback was restricted to only previously scheduled user/subchannel pairs, 
the bounding solution would remain a POMDP with an uncountable number of state-action 
pairs, making it impractical to evaluate. Evaluating the performance of the CGG, however, is 
straightforward since — under CGG feedback — optimal scheduling and resource maximization 
can be performed greedily. To see why, notice that, for any scheduling time t ^ 0, the CGG 
scheme allocates resources according to the following mixed-integer optimization problem: 



^ jt, egg ^ pt, cgg^ _ g^j^gj^^X N ^ \ U^,k,m{^i,k,m^ Pn.k,m) 



il\P')^^ n,k,m 
T 



jt-d pt-d t-d 1 /^x 

-^—005-^ —00) I —00 ( ■ 



I TjT (jT, egg pT,cgg\ 

' / J n.k^mY^n^k^mi n,k,m) 
r=t+l 

Since the choice of {(/i+i'^^gg, p*+i.cgg)^ ^ ^jTmg^ pTmgy^ does not depend on the choice of 
^j*,cgg pt,cgg-j^ the previous optimization problem simplifies to 



7*:^^ (5) 



In the following lemma, we formally establish that the utility achieved by the CGG upper- 
bounds that achieved by the optimal POMDP controller with ACK/NAK feedback. 

Lemma 1: Given arbitrary past allocations P^~^), and the corresponding ACK/NAKs 

F^~^, the expected total utility for optimal resource allocation under the latter feedback is no 
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GSRA = 



—oo 



higher than the expected total utility under CGG feedback, i.e., 

n,k,m T=t n,k,m T=t 

The proof of the above lemma follows the same steps as the proof of [11, Lemma 1], which 
is omitted here to save space. In the next section, we detail the greedy scheduling and resource 
allocation problem and propose a near-optimal solution. 

IV. Greedy Scheduling and Resource Allocation 
The greedy scheduling and resource allocation (GSRA) problem is defined as follows. 

N K M 

max J2J2J2^Um'^{UnAm{{l - a„e~'"-<'=>'"<'=)r^) 

^ n=l k=l m=l 
ptg-p 

S.t. ^n,k,mP'n,k,m ^ -'^COn- (7) 

n,k,m 

Note that, in contrast to the T-horizon objective ©, the greedy objective ([7]) does not consider 
the effect of (J*,P*) on future utility. As stated earlier, we allow Un,k,m{-) to be any real- 
valued function that is twice differentiable, strictly-increasing, and concave, with Un,k,miO) < oo. 
Therefore, U'^j^^{-) > and U",^^{-) ^ 0, using ' to denote the derivative. 

Since it involves both discrete (/*) and continuous (P*) optimization variables, the GSRA 
problem (|7]) is a mixed-integer optimization problem. Such problems are generally NP-hard, 
meaning that polynomial-complexity solutions do not exist. Thus, in Section |IV-B[ we propose 
a near-optimal algorithm for (|7]) with polynomial complexity. To better explain that scheme, 
we first describe, in Section IIV-AI a "brute force" optimal solution whose complexity grows 
exponentially in A^, the number of subchannels. 

A. Brute-Force Algorithm 

The brute-force approach considers all possibilities of /* G X, each with the corresponding 
optimal power allocation. Supposing that /* = I, the optimal power allocation can be found by 
solving the convex optimization problem 

N K M 



V V V In,k,mE{Un,k,m{il " a^e-'""^-'=-<Or„) F*."^ | 

n=l k=l m=l 

S-t. ^ ^ In,k,m Pn,k,m ^ ^con- (8) 



n,k,m 
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To proceed, we identify the Lagrangian associated with ([8]) as 

-^/(A^J-f) ~ ^ ^ ^ In,k,m Pn,k,m ~ ^con^^- 
n,k,m 

- 5^ E {ln,k,mUn,k,m{{l " 0^6"'""^-'=-^-^ )r„,) |F!:i}, (9) 

which yields the corresponding dual problem 

maxmin LUu,P) = max LUll, P* (a)) = LUnl,P*(a*r)), (10) 

where yu} and P*(n}) denote the optimal Lagrange multiplier and power allocation, respectively. 

A detailed solution to (flOl) is given in [16], and so we describe only the main points here. 
First, for a given value of the Lagrange multiplier fi, it has been shown that the optimal powers 
equal 

, , \Pn,k,M if ^ ^ ^ a„A^r™t/;fc,„((l -a„)r^) E{7^^fc|F*r^} 

Pn,k,mW = \ (11) 

I otherwise, 
where Pn,k,m{^) is defined as the (unique) solution to 

Then, for a given /, the optimal value of /i (i.e., /i}) obeys /i} G [/imin, /Wmax] C (0, oo), where 
/imin = mina^6^r^E{[/;;,„,((l-a™e-^™^-<0r™)7*,e-''-^-<^|r_-^}, (13) 

n,k,m ^ I : \ 1 

/imax = maxa^6„r„,[/' ((1 - a„)r^) E {7^,fc|lF*_~^}, (14) 

n,k,m 

and satisfies En,k,m ^n,k,m Plk,M) = ^con. 

Based on (flT)) - (fT4l) . Table H] details the brute-force steps for a given J. In the end, for a 
specified tolerance n, these steps find /i and /i such that /i} G /i] and ft — fi < k. Using 
an approximation of f-i} that lies in the corresponding utility is guaranteed to be no less 

than nXcon from the optimal (for the given /). Therefore, by adjusting k, one can achieve a 
performance arbitrarily close to the optimum. Since |X| = (KM + 1)^ values of / must be 
considered, the total complexity of the brute-force approach — in terms of the number of times 
(fT2l) must be solved — can be shown to be 



^log2(^imax^)J X {KM + 1)^-^NKM, (15) 
which grows exponentially with A^. 
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B. Proposed Algorithm 

We propose to attack the mixed-integer GSRA problem (|7]) using the well known Lagrangian 
relaxation approach [22]. In doing so, we relax the domain of the scheduhng variables /* ^ from 
the set {0, 1} to the interval [0, 1], allowing the application of low-complexity dual optimization 
techniques. Although the solution to the relaxed problem does not necessarily coincide with that 
of the original greedy problem (|7]), we establish in the sequel that the corresponding performance 
loss is very small, and in some cases zero. 

The relaxed version of the greedy problem ([7]) is 



N K M 

A 



max ^Um E {Un,k,m{il - a^e-'-<^,-^-'^)r„,) 

^ n=l k=l m=l 



rGSRA 

I 

S.t. ^n,k,mPn,k,m ^ ^COH; (16) 

n,k,m 

where Xc = {/ G [0, 1]^^^^ : m In,k,m ^ 1 Vn}. Although (fT6l) is a non-convex optimization 
problem due to non-convex constraints, it can be converted into a convex optimization problem 
by using the new set of variables (/*, a?*), where x\^km — ^nkmPnkm- this case, we have 



rGSRA = min ^ <fc,™) s.t. ^ x^^,,^^ ^ X^on, (17) 

— n,k,m n,k,m 

where a?* G W^^^^ denotes the collection of all time-t variables {a;^^. „}, y denotes 
element- wise non-negativity, and B'^^^{-,-) is defined as 



K,k,miyuy2) 



■E{t/„,,,„((1 -a„e^''"<^^^/^^)r„) F*_-^} if ^ 

(18) 

otherwise. 



The modified problem (flTl) is a convex optimization problem and can be solved using a dual 
optimization approach with zero duality gap. In particular, the dual problem can be written as 

max min L(/i, P, a;*) = max min L(yU, P, /*)) 

= maxL(/., J*'*(/.),a;*'*(/.,/*'*(/i))) = L(/.M*'*(/x*), J*'*(/x*))), (19) 

where 

L[fi, I ,x)= ^ ] In,k,m -Bn,k,mi-^n,k,im ■'^n,k,m) "I" ^ ] •^n,k,'m " ^COn^A''; (20) 

n,k,m n,k,m 



October 19, 2011 



DRAFT 



11 



where x*(n, I) is the optimal x for a given (/i, /), where denotes the optimal I E Ic for 

a given /i, and where yU* denotes the optimal ^ ^ 0. 

A detailed solution to this problem was given in [16], and so we describe only the main points 
here. For given values of ^ and /*, we have = Ii^k,mPnArrXf^)^ where 

^t. \PUM ifO^^^«-&.nr™t/;fc,^((l-aJr^)E{7*,,|r„-i} 

Pn,k,M = { (21) 

I otherwise, 
and where Pn^rnilA defined as the (unique) solution to 

/i = ar,X^rm E {K^k,^{{l - a^e-^™^-^-^'^)^".^ )r„) 7^^,e-^-^-'=-('^)^-^ (22) 
To give equations that govern /*'*(/i) for a given /i, we first define 

P'nlM) = - E {t/„,.,™((l - a„e-^-<*^-('^)^-'=)r„) \¥'~£] + /iP;,,^(/i) (23) 
•^nl/^) = {{.k,m) = argminV;^, ,(/i,P^'*, ,(/i)) : V;*fc,m(/^, ^nlm(/^)) ^ Of- (^4) 
If 5** (/i) is a null or a singleton set, then the optimal schedule on subchannel n is given by 



1 (/c,m) G 5*(/i) 
IllXM ={ (25) 
otherwise. 

However, if S^{jj.) has cardinality greater than one, then multiple {k,m) combinations can be 
scheduled simultaneously while achieving the optimal value of the Lagrangian. In particular, if 

SM = {(A;i(n),mi(n)),...,(A;|st(^)l(n),m|54(^)l(n))}, then 

In,k,M = S (26) 

Otherwise, 



where the vector [ln.ki{n).miin), ■ ■ -Jn.k.^t, ,,(n,).m,„t , ,.{n)] lies anywhere in the unit-dS"* (/i)|-l) 
simplex, i.e., it lives within the region [0, l]!-^"*^^)! and satisfies Xll="/''^' In,ki{n),mi{n) = 1- Finally, 
the optimal Lagrange multiplier yU (i.e., fi*) is such that /i* G [/imin, /^max] C (0, oo) and 

E^iU(/^*)^n:U(/^*)=^con, (27) 

where fj,m\n and //max were given in (fT3l) and (fT4l) . respectively. 

For several fixed values of fi, the proposed algorithm minimizes the relaxed Lagrangian (|20|) 
over (/*, 03*) (or, equivalently, over (J*, P*)) to obtain candidate solutions for the original greedy 
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problem (|7]). If, for a given /i, ^ 1 for all n (i.e., the candidate employs at most one 

user/MCS per subchannel), then the candidate solution is admissible for the non-relaxed problem, 
and thus retained by the proposed algorithm. If, on the other hand, > 1 for some n 

(i.e., the candidate employs more than one user/MCS on some subchannels), then the proposed 
algorithm transforms the candidate into an admissible solution as follows: 

= \ (28) 
I otherwise. 

The following lemma then states an important property of these fixed-/i admissible solutions. 

Lemma 2: For any given value of fi, let the power allocation P*'*(yu) be given by (|2TI) . let 
the user-MCS allocation /*''^™(/i) be given by (l28l) . and let the total power allocation be defined 
as X*oP'°(/i) = En,k,mlnjl'^if^) PuaM- Then, ^^^[""(/i) is monotonically decreasing in /i. 

Lemma [2] (see [16] for a proof) implies that the optimal value of the Lagrange multiplier /x 
(i.e., /i*) is the one that achieves the power constraint X^^'^°{jJ,) = Xcon- To find this /i*, the 
proposed algorithm performs a bisection search over fi E [/imin, /^max] that refines the search 
interval [/i, Ji] until Ji — ji < k, where k is a user-defined tolerance. Then, between the two 
schedules / G it chooses the one that maximizes utility, reminiscent of the 

brute-force algorithm. Table HI] summarizes the proposed algorithm. 

The complexity of the proposed algorithm — in terms of number of times (l22l) is solved — is 



^lQg^( Mmax-Mniin )^ X N {K M + 2), (29) 

which is significantly less than the brute-force complexity in (fT5l) . Although the proposed 
algorithm is sub-optimal, the difference between the optimal GSRA utility ?7qsra ^^'^ ^hat 
attained by the proposed algorithm ?7gsra(/^, /i), as /i — /2, can be bounded as follows [16]: 

^GSRA - 1™ t^GSRA(/i, P) < (yU* - /Umin) (^con - Xl^'° {jl*)) (30) 

fo if |5„(/i*)| ^ 1 Vn 

^ <^ I nv/- ;i - ^^^^ 

I (/"max - /Umin)^con Otherwise 
In Section |VIl we evaluate (l30l) by simulation, and show that the performance loss is negligible. 



October 19, 2011 



DRAFT 



13 



V. Updating the Posterior Distributions from ACK/NAK Feedback 

In this section, we propose a recursive procedure to compute the posterior pdfs p(V ^ | F'^) 
required by the proposed greedy algorithm in Table HI] when the channel is first-ordeicl Markov. 

Let the time-t user-fc channel be described by the discrete-time channel impulse response 
hi. = [h\ . . . , h^j^ E C^, where (■)"■" denotes transpose. The corresponding frequency-domain 
subchannel gains = [Hl^f., . . . , fc]""" G are then given by 

Hi = Ghl (32) 

where the OFDMA modulation matrix G E C^^^ contains the first L columns of the A^-DFT 
matrix. Assuming additive white Gaussian noise with unit variance, the SSG of subchannel n 
for user k is given by 7* = ^^p, and so we can write 

Phik I = / Pilik I hi)pihi I F!_-^) (33) 

with pi'Jnk I ^k) = Hlnk ~ l^n^^feP)' whcrc 6{-) is the Dirac delta and e„ is the ra*'^ column 
of the identity matrix. Using the channel's Markov property and Bayes rule, we find that 

PiK I F!:i) = / p{hi I hir') p{hi-' I Fi-^) (34) 



(f^t-, I ^t-, ^ ^ Pifk'' I hjr', Fj-^ \ fir') pjhj-' I Fj-^ \ fir') 

" Ai- Pifu' I \ piK-' I Fi-^ \ fi- ' 



where \ denotes the set-difference operator. Using the fact that p{fk' \ hi~',¥*Z'^ \ fir') = 
pifir' I hi~', P~', p^~'), along with the fact that {P~', p^~') is a deterministic function of F*_"^'^ 
(and therefore of F*_~^"^), we then have from (l35l) that 

° /.r- p(/r^ I K', P") P(hr^ I F--') ■ * ' 

Using the Markov property again, we get 

Piht' I F!r^-i) = / p{ht' I hl'~') pihl'-' I F!r^"i). (37) 

Recall that /* f^, the feedback received about user k on channel n at time t, takes values from 
the set {0, 1, 0}, where denotes a NAK, 1 denotes an ACK, and denotes that user k was not 

' The extension to higher-order Markov channels is straightforward. 
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scheduled on subchannel n at time t. Assuming that, conditioned on h^, the feedbacks generated 
by user k are independent across subchannels, we have 



N 



pifi\hi,i\p') = i[p{fi,\hii\p'), (38) 



n=l 



vUU = f\hll\P 



Em lUm^me-''-<^^-<^ if / = 

Era lUm (l ' a^e"^'"^".^,'"^",^ ) if / = 1 (39) 

where 7* ^ = 1-^*^1^ can be determined from h\ via (|32|) . Together, (I33l)-(|39|) suggest a method 
of recursively updating the channel distributions, using the new feedback obtained at each time 
t, which is given in Table |llll 

We now propose the use of particle filtering [24] to circumvent the evaluation of multidimen- 
sional integrals in the recursion of Table [nil Particle filtering is a well-known technique that 
approximates the pdf of a random variable using a suitably chosen probability mass function 
(pmf). In the sequel, for simplicity of illustrations, we assume a Gauss-Markov model of the 
form 

h\%^ = {l-a)hl, + awl^, (40) 

where w\j, is unit-variance circular Gaussian and a G (0, 1] is a known constant that determines 
the fading rate. Here, w\f, is assumed to be i.i.d. for all t,l,k. At each time-step t, for k E 
{1, . . . K}, we use S particles in the approximations 

PiKir-^) ^ and 
1=1 

1=1 

where hl.[i] = [h\ kl^], ■ ■ ■ , f^[i]y E denotes the z"^ (vector) particle, for i E {1, . . . , S}, 
and t'^.^'^^i?] E M"^ is the probability mass assigned to the particle h^i} [i] based on the observations 
received up to time ^2- The steps to recursively compute these particles and their corresponding 
weights are detailed in Table HVl 
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Using the approximation in (|4T1) . we note that the expectation of any function of subchannel- 
gain, /i^, can be found using 

E{A{hi)}^J2''l^'"'l^Mhm^ (42) 

i 

where A{-) is an arbitrary function. Recalling that the SSG 7^;,, is a deterministic function of 
the subchannel-gain, h^., any function of 7*^ is also a function of hi. 

VI. Numerical Results 

In this section, we numerically evaluate the performance of the proposed greedy scheduling 
and resource allocation from Section |IV] with the posterior update from Section |Vl For this, 
we consider an OFDMA system with independent first-order Gauss-Markov channels (|40l) . We 
assumed, if not otherwise stated, K = 8 available users, = 32 OFDMA subchannels, channel 
fading parameter a = 10~^ and impulse response length L = 2. We used the modulation 
matrix G = y/^F e C^""^ (recall dSll)), where F contains the first A^) columns of the 
unitary A^-DFT matrix and (3 = jj^^ ensures that the variance of if* is unity for all {n, k). 
Thus, the mean of the SSG 7* was also unity for all (n,k). Since the subchannel-averaged 
total transmit power equals ^ En,fc,m 4,fc,m^n,fc,m = jfEn,k,mK,k,m = ^con/N, it is readily 
seen that the average per- subchannel signal-to-noise ratio is SNR = Ej-^^ J2n km-^nk min k} = 
'k Yl,n,k,m^n,k,m'E{ln,k} = ^con/N. For the plots, wc avcragcd 500 realizations, each with 100 
time-slots. Of these 100 time-slots, the first 50 were ignored to avoid transient effects. 

For illustrative purposes, we assumed uncoded 2"'+^ -QAM signaling with MCS index m G 
{1, . . . , 15}. In this case, we have = fn + 1 bits per symbol, one symbol per "codeword," 
and one codeword per packet. In the packet error-rate model e = a^e"*™^'^, we assumed = 
1 and brn = 1.5/(2™+^ — 1) because the symbol error-rate of a 2^+^-QAM system is well 
approximated by exp(— 1.5P7/(2'"+^ — 1)) in the high-(P7) regime [25] and is 1 when 
P7 = 0. Throughout, we used the identity utility (i.e., Un,k,m{x) = x for all n, k, m) so that the 
objective was maximization of sum goodput, and we assumed a feedback delay of c? = 1. 

The performance of the proposed greedy algorithm was compared to three reference schemes: 
fixed-power random user scheduling (FP-RUS), the "causal global genie" (CGG), and the "non- 
causal global genie" (NCGG). The FP-RUS scheme schedules users uniformly at random, allo- 
cates power uniformly across subchannels, and selects the MCS to maximize expected goodput. 
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The FP-RUS, which makes no use of feedback, should perform no better than any feedback-based 
scheme. The CGG (recall Section IIII-AI) performs optimal scheduling and resource allocation 
under perfect knowledge of all SSGs at the previous time-instant (since d = 1), i.e., given 
{7ri ^'^^ ^} ^^^^ ^- From Lemma [H we know that the CGG upper-bounds the POMDP. The 
NCGG is similar to the CGG, but assumes perfect knowledge of all SSGs at all times, i.e., 
given {7^ ^ \fn, k, r} at time t. Thus, it provides an upper bound on the CGG that is invariant to 
fading rate a. The NCGG has a greedy implementation, like the CGG, but without the conditional 
expectation in dS]). 

Figure \T\ shows a typical realization of instantaneous sum-goodput versus time t, when a = 
10^'^. There, one can see a large gap between the FP-RUS and the CGG, and a much smaller 
gap between the CGG and the NCGG. The proposed scheme starts without CSI, and initially 
performs no better than the FP-RUS. From ACK/NAK feedbacks, however, it quickly learns the 
CSI well enough to perform scheduling and resource allocation at a level that yields sum-goodput 
much closer to the CGG than to the FP-RUS. 

Figure [2l plots average sum-goodput versus the number of particles S used to update the 
posterior distributions in the proposed greedy scheme (recall Section |Vl). There we see that the 
performance of the proposed scheme increases with S, but shows little improvement for S > 30. 
Thus, 5 = 30 particles were used to construct the other plots. Remarkably, with only S = 5 
particles, the proposed algorithm captures a significant portion of the maximum possible goodput 
gain over the FP-RUS. 

Figure [3] plots average sum-goodput versus the fading rate a. There we see that, at low 
fading rates (i.e., small a), the proposed greedy scheme achieves an average sum-goodput that 
is much higher than the FP-RUS and, in fact, not far from the CGG upper bound. For instance, 
at a = 10""^, the sum-goodput attained by the proposed scheme is 92% of the upper bound 
and 170% of that attained by the FP-RUS. As the fading rate a increases, we see that the 
sum-goodput attained by the proposed scheme decreases, and eventually converges to that of 
the FP-RUS. This behavior is due to the fact that, as a increases, it becomes more difficult to 
predict the SSGs using delayed ACK/NAK feedback, thereby compromising the scheduling- and- 
resource-allocation decisions that are made based on the predicted SSGs. In fact, one can even 
observe a gap between the CGG and NCGG for large a because, even with delayed perfect-SSG 
feedback, the current SSGs are difficult to predict. 
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Figure [3] reveals a gap between the proposed scheme and the CGG bound that persists as 
a — 7- 0. This non-vanishing gap can be attributed — at least in part — to greedy scheduling under 
ACK/NAK feedback. Intuitively, we have the following explanation. Because the inferred SSG- 
distributions of not-recently-scheduled users quickly revert to their apriori form, the proposed 
greedy algorithm will continue to schedule users as long as their SSGs remain better than the 
apriori value. There may exist, however, not-recently-scheduled users with far better SSGs who 
remain invisible to the proposed scheme, only because they have not recently been scheduled. 

Figures |4] and [5] plot average sum-goodput versus the number of subchannels (i.e., total 
bandwidth) N. In Fig. |4l the total BS power Xcon is scaled with such that the per-subchannel 
SNR remains fixed at lOdB, whereas, in Fig. [5l the total BS power Xcon remains invariant 
to the bandwidth N, and is set such that per-subchannel SNR = lOdB for N = 32. In both 
cases, the average sum-goodput increases with bandwidth N, as expected, since the availability 
of more subchannels increases not only scheduling flexibility, but also the possibility of stronger 
subchannels, which can be exploited by the BS. In Fig. HI where the per-subchannel SNR is fixed, 
the sum-goodput increases linearly with bandwidth N, as expected. In all cases, the proposed 
greedy scheme captures about 80% of the sum-goodput gain achievable over the FP-RUS. 

Figure [6] plots average sum-goodput versus the number of available users K. It shows that, 
as K increases, the average sum-goodputs achieved by the NCGG, CGG, and the proposed 
greedy schemes increase, whereas that achieved by the FP-RUS remains constant. This behavior 
results because, with the former schemes, the availability of more users can be exploited to 
schedule users with stronger subchannels, whereas with the FP-RUS scheme, this advantage is 
lost due to the complete lack of information about the users' instantaneous channel conditions. 
Figure [6] also suggests that, as K increases, the sum-goodput of the proposed greedy scheme 
saturates. This can be attributed to the fact that the proposed greedy algorithm can only track 
the channels of recently scheduled users, and thus cannot benefit directly from the growing pool 
of not-recently-scheduled users. 

In Figure|71 the top subplot shows average sum-goodput versus SNR, while the bottom subplot 
shows the average value of the bound (l30l) on the optimality gap of our proposed approach to the 
GSRA problem, also versus SNR. The top plot shows that, as the SNR increases, the proposed 
greedy scheme continues to perform much closer to the NCGG/CGG bounds than it does to the 
FP-RUS scheme. The bottom plot establishes that the sum-goodput loss due to the sub-optimality 
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in the algorithm used to attack the GSRA problem is negligible, e.g., at most 0.0025% over all 
SNR. 

VII. Conclusion 

In this paper, we considered the problem of joint scheduling and resource allocation in the 
OFDMA downlink under ACK/NAK feedback, with the goal of maximizing an expected long- 
term goodput-based utility subject to an instantaneous sum-power constraint. First, we established 
that the optimal solution to the problem is a partially observable Markov decision process 
(POMDP), which is impractical to implement. Consequently, we proposed a greedy approach to 
joint scheduling and resource allocation based on the posterior distributions of the squared sub- 
channel gain (SSG) for every user/subchannel pair, which has polynomial complexity. Next, for 
Markov channels, we outlined a recursive method to update the posterior SSG distributions from 
the ACK/NAK feedbacks received at each time-slot, and proposed an efficient implementation 
based on particle filtering. To gauge the performance of our greedy scheme relative to that of the 
optimal POMDP (which is impossible to implement), we derived a performance upper-bound 
on POMDP, known as the causal global genie (CGG). Numerical experiments suggest that our 
greedy scheme achieves a significant fraction of the maximum possible performance gain over 
fixed-power random user scheduling (FP-RUS), despite its low-complexity implementation. For 
example, a representative simulation using = 32 OFDMA subchannels. A' = 8 available 
users, SNR= lOdB, and S = 30 particles, shows that the sum-goodput of the proposed scheme 
is 92% of the upper bound and 170% of that attained by the FP-RUS (see Fig. |3]). 
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TABLE I 

Brute-force steps for a given / 

1) Initialize fi = /irriin and p. = fimm- 

2) Set fi = =2—. 

3) For each (n, k,m), 

a) Use dl 1I(-J12I) to obtain Pn,k,m{fJ-)- 

4) Calculate /J,) = Y.u,k.m Pn,k,mM- 

5) If Xtot(/, fi) > Xcon , set = n, otherwise set p, = H- 

6) If p, — fi > n, go to step 2), else proceed to step 7). 

7) If p) / Xto,(I, ^i), set A = »i-x^fTfii ' otherwise set A = 0. 

— tot ^ '£1' tot ^ ' ^ ^ ^ ^ 

8) Set /i/ = p. The best power allocation is given by P{I) = XP*{p) + (1 — A)P*(/i), and L/ = L\[p, P[I)) gives 
the best Lagrangian value. 



TABLE II 
Proposed greedy algorithm 



1) Initiahze ^ — /imin and fj. — /imax. 

2) SetM=^. 

3) For each subchannel n = 1, . . . , A^: 

a) For each (A:, m), 

i) Use ([2T]l-(|22l( to calculate P*'* ^(^i). 

ii) Use P^%^M to calculate via il}. 

b) Calculate S* (/i) using l [24l ). 

4) Find using (|28l(. 

5) Calculate X^^M = E„..,™ /^..(m) P^^M- 

6) If A(Q,^™(/^) > Xcon, set fi = fi, otherwise set fi = fi. 

7) If /i — /I > K, go to step 2), else proceed to step 8). 

8) Now we have /i* £ [/i, /i] and p — fi < k. For both J = P''^'°{fi) and / = /''""^"(/i) (since they may differ), calculate 
P{I) and Lj as described for the brute force algorithm. 

9) Choose I — argminjg|jt,pro^^^ j* 'ts the user-MCS allocation and P — P{-^ ) the associated power 
allocation. 
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TABLE III 

Recursive update of channel posteriors 



At time t, for each user k, the pdf ^ | IF^^^ ^) is available from the previous time-instant. The user-fc recursion is then 

1) Observe new feedbacks f-'^ G {0, 1, 0}^. 

2) Compute pih^-'' \ F*."^"^) using lUTll. 

3) Compute piflr'' \ h*-'', P^'', P*"'*) using the error-rate rule ll38j-l|39ll. 

4) Using the distributions obtained in steps 2) and 3), compute pih^^T'^ \ F^T^) via Bayes-rule step in l |36t . 

5) Compute p{h\ \ F^J^) using the Markov-prediction step i34t . 

6) For each n, compute p{■y*,^^^, \ F*_"^) via l ljsl l. 



TABLE IV 

Particle filtering steps 



Let the system begin at time-instant to- If i € {to, ■ ■ ■ ,to + d — 1}: 

1) Initialize Wi,k,l} by drawing i.i.d. samples from CAf{0, 2^)- 

2) Set the importance weights = ^ \lk,i. 
For any other time-instant t (Jj + d): 

1) Using the previous samples {hjf,[i] \/l,k,i,T : t ^ t — d}, obtain new samples according to the underlying Markov 
model as follows 

d-l 
3=0 

where yl~^-' [i] is drawn i.i.d from CAf{0, 1) for all i, I, k,j. 

2) For each user k, 

a) Using the received feedbacks /^"'^ and the set of importance weights from time [t — d — 1), i.e., 
[i] Vi}, compute the new set of importance weights at time (t — d) using 

t — d\t — dr.i t — d— lit — rf— 1 r -1 { jii — d \ it — d it — dr-l T^ — d nt~d\ 



L r .1 / j,t — d I 1 t — d I t — dr-i rt—-d rti~d\ 



for all i, where p{f^r'^ \ h^^"^ = h*-%], p-^ , P^-'') is given by lUl-llllll 
b) Normalize the weights via 



t — d\t-dr 



^k H^ ^ t-d\t-d,. ^'- 



c 



) Compute the weights for the posterior distribution, p{h\ \ F*!^) using 

j=i 



d) Normalize the weights via 

t\t-dr 

Vi. 



Et\t — dv .1 
i "k b\ 
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Fig. 2. Average sum-goodput versus the number of particles used to update tlie channel posteriors. Here, TV — 32, K = 8, 
SNR = lOdB, and a = 10"^ 
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Fig. 3. Average sum-goodput versus fading rate a. Here, A'^ — 32, K — 8, SNR = lOdB, and 5* — 30. 
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Fig. 4. Average sum-goodput versus number of subchannels iV. Here, K ^ 8, SNR = lOdB, a = 10"^, and 5" = 30. 
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Fig. 5. Average sum-goodput versus number of subchannels A'^. Here, K — 8, Xcon does not scale with A*' and it is chosen 
such that SNR = lOdB for iV = 32, a = 10"^ and S = 30. 
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Fig. 6. Average sum-goodput versus number of users. In this plot, N = 32, SNR = lOdB, a = 10"^ and S = 30. 
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Fig. 7. The top plot shows the average sum-goodput as a function of SNR. The bottom plot shows the average bound on the 
optimality gap between the proposed and optimal greedy solutions (given in ll30t). i.e., the average value of (/i* — iJ.,mn)(Xcon — 
fi*)). In this plot, N = 32, K = 8,a = 10"^ and S = 30. 
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